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Order  statistics  are  often  needed  in  computer 
simulation.  Common  examples  are  quantile  estima- 
tion and  censored  data  test  statistics.  Methods 
for  generating  order  statistics  in  various  contexts 
are  surveyed.  Sorting  and  the  use  of  histograms, 
the  most  general  methods,  are  first  discussed, 
followed  by  a method  for  non- identically  distribu- 
ted samples.  Finally,  the  very  powerful  methods 
applicable  to  iid  random  variables  are  surveyed. 
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I.  INTRODUCTION 
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x be  a set  of  random  observations, 
n 


L(l)’  (2)’ 

smallest  value. 


Let  x^,  x2,  . 

The  associated  order  statistics  are  x, 

...,  x.  . where  x.,.  is  the  JCh 
(n)  (i) 

Although  the  assumption  of  independent  and  identi- 
cally distributed  values  is  commonly  made,  (David 
[7],  Gibbons  [10]), in  simulation  studies  the  inde- 
pendence assumption  need  not  hold  and  the  identically 
distributed  assumption  often  does  not  hold. 

Order  statistics  arise  in  simulation  models  in  a 
number  of  ways.  is  the  time  of  failure  for  a 

system  requiring  i of  n components’  to  operate, 

where  the  distribution  of  the  times  to  failure  for 

each  component  are  not  necessarily  independent  or 

identically  distributed.  In  PliRT  simulation  the 

value  of  x.  the  maximum  of  n observations,  is 
(n) 

the  time  at  which  a future  node  is  realized,  where 
the  observations  may  not  be  independent  and  usually 
are  not  identically  distributed.  In  next-event 
timekeeping,  the  time  of  the  next  event  can  be 
viewed  as  the  first  order  statistic  of  the  times  in 
the  future  events  calander. 

In  other  cases,  order  statistics  may  be  the  desired 
output  rather  than  an  input  of  only  intermediate 
interest.  Queuing  simulations  often  estimate  the 
pth. percentile  of  the  waiting  time  distribution  by 
the  appropriate  order  statistics,  as  do  Monte  Carlo 
studies  of  distributions  of  test  statistics. 
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This  paper  considers  order  statistics  in  the  simu- 
lation context.  In  Section  II,  the  various  ways 
order  statistics  arise  in  simulation  are  categori- 
zed. Later  sections  then  discuss  by  category  the 
methods  available  for  generating  order  statistics. 

II.  CATEGORIZATION  OF  ORDER  STATISTICS 

Order  statistics  in  simulation  can  be  categorized 
several  ways.  In  this  section  a categorization  is 
given  yielding  twelve  separate  situations.  While 
euch  is  different,  in  later  sections  it  will  become 
clear  that  similar  methods  sometimes  apply  to  more 
Chan  one  situation. 

Letting  F ^ denote  the  inverse  cummulative  distri- 
bution- function  for  the  ith  random  variable,  the 
categorization  is<  based  upon 


1. 


Distribution  of  the  x.'s  is 
1 


a. 

known,  but  F 
i - 1.  2,  .... 

is 

n 

not  available  for  some 

b. 

known,  and  FT1 
i * 1 2 ^ 

is 

n 

available  for  all 

2. 

c. 

The 

unknown, 
x^'s  are 

a. 

independent 

b. 

dependent 

3. 

The 

Xj ' s are 

a. 

known,  but  F.'L 

is 

not  available  lor  some 

i * 1.  2,  .... 

n 

b. 

known,  and  F ^ 

1 ■ i 9 

j. 

is 

n 

available  for  all 

2. 

c. 

The 

unknown, 
x^'s  are  — 

a. 

independent 

b. 

dependent 

3. 

The 

x 's  are 

a. 

Identically  distributed 

b. 

not  identically  distributed 

which  results  in  3 x 2 x 2 ■ 12  separate  categories 
of  order  statistics  in  simulation. 

The  first 
which  the 
butlon  of 

partitioning  is  based  on  the  manner  in 
order  statistics  arise.  If  the  distri- 
the  x^'s  is  known,  as  in  the  PERT  and 
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reliability  examples  above,  the  problem  is  to  gen- 
erate order  statistics  for  the  known  distributions. 
On  the  other  hand,  the  simulation  may  be  used  to 
estimate  quantiles  or  entire  distributions,  which 
corresponds  to  the  x^'s  being  the  simulation  out- 
put having  unknown  distributions.  In  the  latter 
case  some  sort  of  selection  or  sorting  must  be  per- 
formed on  the  x^'s  to  obtain  the  ,*ider  statistics. 

In  the  former  case  of  the  distribution  of  the  x^'s 

being  known,  two  situations  arise.  If  the  inverse 
cumulative  distribution  functions,  F"1  i - 1,  2, 
...,  n,  are  available,  efficient  methods  become 
available  for  more  direct  generation  of  order  sta- 
tistics than  selection  or  sorting. 

A second  partition  arises  as  to  whether  the  x^’s 
are  independent.  As  might  be  expected,  independ- 
ence sometimes  leads  to  efficient  methods. 

The  third  partition  is  whether  or  not  the  x 's  are 
identically  distributed.  Identically  distributed 
values  lead  to  better  methods  than  non-identically 
distributed  values. 


III.  SORTING.  SELECTION.  AND  HISTOGRAMS 

A general  method  applicable  to  all  twelve  cases, 

but  not  the  best  for  several  cases,  is  to  generate 

x, , x_,...,  x and  to  sort  the  n values  to  obtain 
i i n 

X(l)’  X(2) X(n)’  *nefflcient  methods  require 

time  proportional  to  n^,  while  efficient  methods 
require  time  proportional  to  n In  n (see  Knuth  [14]). 
There  is  no  option  to  sorting  if  1)  all  order  sta- 
tistics are  needed  and  2)  no  amount  of  approxima- 
tion is  allowed,  except  as  noted  in  Section  IV. 

If  only  some  order  statistics  are  needed,  improve- 
ment over  n In  n sorting  can  be  made.  Chambers  [3] 
gives  algorithm  PSORT  for  partially  sorting  x.,  , 

....  x^  to  obtain  only  specified  order  statistics: 
For  a small  fixed  number  of  specified  order  sta- 
tistics, the  time  for  PSORT  is  nearly  proportional 
to  n.  An  80  line  FORTRAN  implementation  is  given 
in  [3].  (See  also  Chambers  [4].)  An  example  when 
only  a fraction  of  the  order  statistics  are  needed 
would  be  in  plotting  the  distribution  function  frcro 
a sample  of  many  observations.  Determining  every 
tenth,  say,  order  statistic  may  be  satisfactory. 
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Floyd  and  Rivest  [8]  give  an  algorithm  for  select- 
ing the  ich  order  statistic  from  n values.  In  [9] 
chey  show  the  number  of  comparisons  is  n + min 
(k,  n-k)  + o(n) . 

If  some  approximation  is  allowed,  a histogram  can 
be  used.  A histogram  having  tu  <<  n cells  can  be 
used  to  sort  the  observations  in  one  pass  (as  they 
are  generated).  For  discrete  distributions,  histo- 
grams can  often  be  used  with  no  loss  of  information. 
For  continuous  values,  however,  information  is  lost 
in  that  the  c^  values  in  cell  i can  not  be  recover- 
ed completely.  The  cell  containing  the  icl'  order 
statistic  may  be  determined  with  certainty,  but  how 
to  best  determine  the  point  in  the  cell  to  represent 
the  ic*1  order  statistic  is  not  obvious.  The  cell 
mid-point  is  the  most  straight  forward  choice. 

David  and  Mishriky  [6],  who  considered  only  n < 100, 
state  that  "the  effect  of  grouping...  is...  of 
minor  importance  for  h - 0.6  (standard  deviations) 
which  represents  quite  coarse  grouping."  Schmeiser 
and  Deutsch  illustrate,  via  deterministic  calcula- 
tions, three  potentially  serious  idiosyncrasies 
which  occur  with  the  large  sample  sizes  common  in 
quantile  estimation  via  simulation.  Two  are  poten- 
tial problems  concerning  order  statistics  directly: 

1.  The  expected  value  of  the  midpoint 
estimator  can  depend  heavily  upon 
both  the  histogram  cell  width  and 
the  placement* of  Che  histogram. 

2.  Larger  cells  can  resulc  in  smaller 
variance,  thus  incorrectly  indicating 
an  answer  more  accurate  than  actually 
obtained. 

Schmeiser  [22]  suggests  estimating  the  rC*'  order 

statistic  ns  a+b[q-(  ^ c -r+1) / (c  +1) ] where  q is 

i-il  1 9 q 

the  smallest  integer  such  that  £ c^>  r.  This  esti- 

mator behaves  considerably  more  like  the  true  order 
statistic  than  does  the  crude  cell  midpoint 
estimator. 

IV.  METHODS  WHEN  F~X  ARE  AVAILABLE 

When  F^  is  available,  values  of  x^  can  be  generated 

using  F~1(u)  where  u - U (0,1).  This  inverse  trans- 
formation method  leads  to  efficient  methods  for 
order  statistics  when  F^  is  available.  Often  F^ 
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is  available  as  a closed-form  formula,  as  is  the 
case  for  Che  Weibull,  exponential,  Cauchy , double  ex- 
ponential, uniform,  triangular,  and  power  series 
distributions.  The  general  families  of  distributions 
of  Burr  [12],  Ramberg  and  Schmeiser  [19,20]  and  ^ 
Schmeiser  and  Deutsch  [24]  also  have  closed-form  F^. 
It  has  sometimes  been  overlooked  that  numerical 
solution  of  is  equally  valid,  even  if  not  as 
easy  to  implement.  For  example,  most  computer  pack- 
ages have  routines  to  provide  the  uc*’  quancile  of 
the  normal  and  gamma  distributions. 

Schmeiser  [23]  gives  a method  for  reducing  the 
effort  to  generate  the  maximum,  minimum  and  range 
of  a sample  arising  from  non-idencically  distributed 
random  variables  having  easy-co-evaluate  F“^  . 

Based  on  a preliminary  check  of  u^  to  a partition  p., 
the  calculation  of  x^  ■ F“^(u^)  someclmes  avoiued. 

Since  the  evaluation  of  F^is  often  time  consuming, 
substantial  reduction  in  computation  time  can  result 
in  simulations  where  generation  times  play  a signi- 
ficant role.  While  conceptually  valid  for  dependent 
random  yariables,  the  conditional  distribution  func- 
tion (u|x^,  X2 Xi-1^  musC  k®  available. 

Other  than  for  the  multivariate  normal,  this  is  tare. 


The  most  impressive  gains  in  efficiency,  ease  of 
implementation,  and  memory  requirements  are  obtained 
for  independent,  identically  distributed  values  gen- 
erated via  F“1  . Schucany  [27]  gives  the  following 
mechod  for  generating  : 


Let  v , v^,  . . 

Set  u ■v^° 
sec  U(n)  1 * 

general  u(n.1) 


. , vr  be  independent  U(1 ,0)values. 

1/ (n-1)  . , 

u . ,<*  u/  \ vi  and  in 

(n-1)  (n)  2 


1/ (n-i) 

U V 

(n+l-i)  i+1 


Sec  ■ F L(u(i))  for  all  desired  order  statis- 

tics. That  the  ic^  uniform  order  statistic  is  tran- 
formed  directly  co  the  i^h  order  statistic  of  the 
distribution  of  interest  follows  from  F“^being  a 
monotdnically  nondecreasing  function.  This  recur- 
sive algorithm  requires  effort  linear  in  n and  is 
therefore  preferable  co  complete  sorting  for  large 
n.  Even  for  small  n this  approach  is  efficient  if 
only  a few  of  the  extreme  order  statistics  are  re- 
quired. 
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If  che  extreme  order  statistics  are  not  needed, 
this  procedure  can  be  improved  by  noting  that 

i 

u.  - 1 - exp  ( £ (In  v )/(n+l-j)] 

V ' j-1  J 

for  i ■ 1,  2,  n, 

since  vm  is  calculated  as  exp(m  In  v) . Lewis  [15] 
carries  this  a step  further  by  pointing  out  that  it 
is  faster  to  obtain  In  v as  Che  negacive  of  a 
standard  exponential  variate  chan  co  take  the  log- 
arithm of  a randomly  generated  U(0,1)  value  v. 

Either  Marsalia's  method  (see  Knuth  [13])  or  Lewis 
and  Learmouth  [16]  would  be  appropriate  to  generate 
the  exponential  values. 

Lurie  and  Hartley  [17]  published  a mechod  analagous 

co  Schucany's  at  about  che  same  time,  but  began 

their  recursion  at  x...  and  proceeded  coward  x,  . . 

(1)  (n) 

In  addition  to  the  recursive  algorithm,  under  the 
heading  "Simultaneous  generation  of  order  statistics 
for  a multiplicity  of  sample  sizes,"  they  note  that 
Che  ic''  order  statistic  u,  . may  be  generated  as  a 
ratio  of  gamma  random  variables.  (Note  che  erron- 
eous substitution  of  "1"  for  "i"  in  their  equation 
10.)  Ramberg  and  Tadikamalla  [21]noced  more  direct- 
ly chat  u...  has  a beta  distribution  with  parameters 
i and  n+1.  ^Given  che  recent  advances  in  beta  vari- 
ace  generation  (Cheng[5]  and  Schmeiser  and  Shalaby 
[26])  this  mechod  is  quite  efficient  when  only  one 
statistic  from  the  center  of  che  sample  is  needed. 

If  more  chan  one  is  needed  from  the  center  of  the 
sample,  the  recursion  algorithms  can  be  applied 
beginning  at  the  value  generated  as  a beca  variate. 

Rabinowitz  and  Berenson  [18]  compare  the  "grouping 
mechod"  to  the  other  methods  for  independent  identi- 
cally distributed  random  variables  when  all  n order 
statistics  are  needed.  The  grouping  method  consists 
of  partitioning  the  unit  interval  into  several  sub- 
intervals, generating  n U(0,1)  values,  performing 
a permutation  sort  on  each  group  of  observations 
for  each  subinterval,  and  using  F-1  to  generate  the 
x,  .'s  from  the  sorted  u.-.'s.  This  grouping 
method  was  fastesc,  but  Obviously  was  somewhat  more 
difficult  co  implement. 

V.  MISCELLANEOUS  RESULTS 

1.  Sillitto  [31]  gives  some  relationships  between 
expectations  of  order  statistics  which  may  be  use- 
ful when  various  sample  sizes  are  of  incerest. 

2.  Some  recent  work  on  quantile  estimation  not 
referenced  above  included  Goodman,  Lewis  and 
Robbins  [11],  Igleharc  [12],  and  Seila[28,29,30] . 
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